Broadcasting Messages in Fault-Tolerant Distributed Systems: The Benefit of Handling Input-Triggered and Output-Triggered Suspicions Differently
نویسندگان
چکیده
This paper investigates the two main and seemingly antagonistic approaches to broadcasting messages in fault-tolerant distributed systems: the approach based on Reliable Broadcast, and the one based on View Synchronous Communication (or VSC for short). We discuss both communication primitives in a system model with fair-lossy channel, which leads us to introduce the “time-bounded buffering” problem: VSC addresses this problem, but not Reliable Broadcast. Moreover, we show that VSC solves Reliable Broadcast in a system model with “program-controlled crash”. However, VSC does more than Reliable Broadcast, and this has a cost. We analyse this cost by distinguishing between two types of failure suspicions: input-triggered failure suspicions that are related to incoming messages, and output-triggered failure suspicions that are related to outgoing messages. We show that VSC has not managed to exploit the difference between these two types of failure suspicions, which has not allowed to solve the dilemma between (1) short fail-over time and (2) infrequent incorrect exclusion of processes from the membership. We show how to escape from this dilemma by replacing the standard VSC broadcast primitive by two broadcast primitives, one sensitive to input-triggered suspicions, and the other sensitive to output-triggered suspicions. This allows to get the best of two worlds.
منابع مشابه
Scheduling Strategy in Fault Tolerant Time Triggered Architectures
Reliable safety critical systems are designed for fault tolerance and impose high dependability with reliability requirements. Scheduling such systems is hard in traditional event triggered systems and becomes difficult as complexity increases. Time triggered systems with timing guaranteed in task activation, avoiding the risk of missing a hard-critical deadline, overcoming the problem of time ...
متن کاملTime vs. Space in Fault-Tolerant Distributed Systems
Algorithms for solving agreement problems can be classified in two categories: (1) those relying on failure detectors that we call FD-based, and (2) those that rely on a Group Membership Service that we call GMS-based. The paper discusses the advantages and limitations of these two approaches, and proposes an extension to the GMS-approach that combines the advantages of both approaches, without...
متن کاملFault-Tolerant Clock Synchronization for Embedded Distributed Multi-Cluster Systems
When time-triggered (TT) systems are to be deployed for large embedded real-time (RT) control systems in cars and airplanes, one way to overcome bandwidth limitations and achieve complexity reduction is the organization in clusters of strongly interacting computing nodes with well-defined interfaces. In this case, clock synchronization of different cluster times supports meaningful exchange of ...
متن کاملTTP - A Protocol for Fault-Tolerant Real-Time Systems
The Time-Triggered Protocol integrates such services as predictable message transmission, clock synchronization, membership, mode change, and blackout handling. It also supports replicated nodes and replicated communication channels. eal-time control systems must share critical information among autonomous subsystems in a timely and reliable manner. For example, automotive applications have sep...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کامل